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Abstract 

This paper studies a formulation of 1-bit Compressive Sensing (CS) problem based on the maximum 
likelihood estimation framework. In order to solve the problem we apply the recently proposed Gradient 
Support Pursuit algorithm, with a minor modification. Assuming the proposed objective function has a 



Stable Restricted Hessian, the algorithm is shown to accurately solve the 1-bit CS problem. Furthermore, 
the algorithm is compared to the state-of-the-art 1-bit CS algorithms through numerical simulations. The 

H 

M results suggest that the proposed method is robust to noise and at mid to low input SNR regime it 

achieves the best reconstruction SNR vs. execution time trade-off. 

T T~^ Index Terms 

> 

Cn 1-bit compressive sensing, sparsity, quantization 



I. Introduction 



Quantization is an indispensable part of digital signal processing and digital communications systems. 
To incorporate Compressive Sensing (CS) methods in these systems, it is thus necessary to analyze and 
evaluate them considering the effect of measurement quantization. In the recent years there is a growing 
interest in quantized CS in the literature [l]-[6], particularly the extreme case of quantization to a single 
bit, known as 1-bit Compressive Sensing [ ], where only the sign of the linear measurements are recorded. 
The advantage of this acquisition scheme is that it can be implemented using simple hardware that is 
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not expensive and can operate at very high sampling rates. The formulation of this problem is also very 
similar to the sparse logistic regression, very useful in machine learning applications [ ]. 

As in standard CS, the algorithms proposed for the 1-bit CS problem can be categorized into convex 
methods and non-convex greedy methods. In [ ] an algorithm is proposed for 1-bit CS that induces 
sparsity through the ^i-norm while penalizes inconsistency with the 1-bit sign measurements via a convex 
regularization term. In a noise-free scenario, the 1-bit measurements do not convey any information about 
the length of the signal. Therefore, the algorithm of [7], as well as other 1-bit CS algorithms, aim at 
accurate estimation of the normalized signal. Requiring the 1-bit CS estimate to lie on the surface of the 
unit-ball imposes a non-convex constraint in methods that perform an (approximate) optimization, even 
those that use the convex £i-norm to induce sparsity. Among greedy 1-bit CS algorithms, an algorithm 
called Matching Sign Pursuit (MSP) is proposed in [ ] based on the CoSaMP algorithm [ ]. This 
algorithm is empirically shown to perform better than the standard CoSaMP algorithm for estimation 
of the normalized sparse signal. In [ ] the Restricted-Step Shrinkage (RSS) algorithm is proposed for 
1-bit CS problems, improving the performance of £i-based algorithms. This algorithm, which is similar 
to trust-region algorithms in non-convex optimization, is shown to converge to a stationary point of the 
objective function regardless of the initialization. More recently, [ ] derived a lower bound on the best 
achievable reconstruction error of any 1-bit CS algorithm in noise-free scenarios. Furthermore, using 
the notion of "binary stable embeddings", they have shown that Gaussian measurement matrices can be 
used for 1-bit CS problems both in noisy and noise-free regime. The Binary Iterative Hard Thresholding 
(BIHT) algorithm is also proposed in [ ] and shown to have favorable performance compared to the 
RSS and MSP algorithms through numerical simulations. For robust 1-bit CS in presence of noise, [13] 
also proposed the Adaptive Outlier Pursuit (AOP) algorithm. In each iteration of the AOP, first the sparse 
signal is estimated similar to BIHT with the difference that the potentially corrupted measurements are 
excluded. Then with the new signal estimate fixed, the algorithm updates the list of likely corrupted 
measurements. The AOP is shown to improve on performance of BIHT through numerical simulations. 
[14] proposed a linear program to solve the 1-bit CS problems in a noise-free scenario. The algorithm is 
proved to provide accurate solutions, albeit using a sub-optimal number of measurements. Furthermore, in 
[8] a convex program is proposed that is robust to noise in 1-bit measurements and achieves the optimal 
number of measurements and in [15] it is shown that non-Gaussian matrices can be used for acquisition 
under certain conditions on the acquired signal. 

In this paper, we formulate the 1-bit CS problem as an sparsity-constrained optimization. As described 
in Section II, the objective function is obtained by adjusting the loss function that arises in the Maximum 



Likelihood Estimation (MLE) formulation of the problem. To solve this optimization problem in Section 
III we propose a slightly modified version of the Gradient Support Pursuit (GraSP) algorithm proposed in 
[16]. In Section IV, the algorithm is shown to yield an approximate solution provided that the objective 
function satisfies certain sufficient conditions. Furthermore, in Section V, we compare the performance 
of our algorithm with the BIHT algorithm and the non-convex variant of the 1-bit CS solver introduced 
in [ .'] through numerical simulations. As an aside, we show that the non-convex solver described in [ ] 
has an explicit solution (see Appendix B). Finally, we discuss and conclude in Section VI. 

Notation: In the remainder of the paper we denote the positive part of a real number x by (z) , . For a 
positive integer k, the set {1, 2, . . . , k} is denoted by [A;]. The indicator function is denoted by 1 (•). Vectors 
and matrices are denoted by boldface characters. With the exception of J\f which we use to denote the 
normal distribution as in J\f (0, 1), calligraphic letters are reserved for denoting sets. The support set (i.e., 
the set of non-zero coordinates) of a vector x is denoted by supp (x). The best /c-term approximation of 
a vector v is denoted by v&. Depending on the context vr fc i may also be a /c-dimensional vector denoting 
the restriction of v to its k largest entries in magnitudes (i.e., truncated best /c-term approximation of 
v). Restriction of an n-dimensional vector v to its entries corresponding to an index set X C [n] is 
denoted by v| z . Similarly Aj denotes the restriction of a matrix A to the columns enumerated by X. 
Restriction of the identity matrix to the columns enumerated by X is particularly denoted by Pj. The 
operator norm of A with respect to the ^-norm is denoted by ||^4||. For square matrices A and B we 
write B ^ A to state that A — B is positive semidefinite. We use V/ (•) and V 2 / (•) to denote the 
gradient and the Hessian of a twice continuously differentiable function / : W 1 \- > R. For an index set 
X C [n], the restriction of the gradient to the entries selected by X and the restriction of the Hessian 
to the entries selected by X x X are denoted by Vx/ (•) and Vj/ (•)> respectively. Finally, numerical 
superscripts within parentheses denote the iteration index. 

II. Problem Formulation 

We cast the 1-bit CS problem in the framework of statistical parametric estimation which is also 
considered in [ ]. In 1-bit CS, binary measurements y€{±l}ofa signal x* G IR 71 are collected based 
on the model 

y = sgn((a,x*) + e), (1) 



where a is a measurement vector and e denotes an additive noise with distribution A/"(0,<r 2 ). It is 
straightforward to show the conditional likelihood of y given a and signal x can be written as 

Pr{y|a;x} = d>('y^ c) 

with <J> (•) denoting the standard normal cumulative distribution function (CDF). Then, for measurement 
pairs {(aj,2/j)}£L]tlie MLE loss function is given by 



/„( X): =4|>(*(,,<^)). 



The estimator should exploit the sparsity of the solution and sparsely minimize the function above. Note, 
however, that at high Signal-to-Noise Ratio (SNR) this function does not lend itself easily to optimization. 
To observe this behavior, rewrite f MLE as 



/mle (x) = — y^ g n [ Vi ( a i? F — r ) ) 

m ^ \ \ ll x *ll 2 // 



|x* II 

where rj := - 2 is the SNR and g^ (t) := — log <I> (uit) for all uj > 0. As r/ — > +00 the function g v (t) 
tends to 

t >0 

Soo (t) := < log 2 t = • 

+00 t < 
Therefore, as the SNR increases to infinity / MLE (x) tends to a sum of discontinuous constant functions 
that do not uniquely identify the solution and are difficult to handle in practice. This is essentially the 
same problem as the amplitude ambiguity demonstrated in the original formulation in [ ]. Furthermore, 
whether the noise level is too low or the signal is too strong relative to the noise, in a high (but finite) 
SNR scenario the measurement vectors are likely to be consistent with the noise-free measurements of 
the true signal x*. In these cases, / MLE (tx*) can be made arbitrarily close to the zero lower bound as 
t — >• +00. Therefore, / MLE would not have a bounded minimizer. This can be interpreted as an infinite 
estimation error. 

To avoid the problems mentioned above we consider a modified loss function 



1 ra 

/o(x):= rbg($( W (a il x))) I (2) 

m *-^ 

4 = 1 

while we merely use an alternative formulation of (1) given by 

y = sgn (77 (a, x*) + e) , 



in which rj > denotes the true SNR, x* is assumed to be unit-norm, and e ~ J\f (0, 1). The aim is accu- 
rate estimation of the unit-norm signal x* which is assumed to be s-sparse. Disregarding computational 
complexity, the candidate estimator would be 

argmin /o (x) s.t. ||x|| < s and ||x|| 2 < 1. (3) 

However, finding the exact solution (3) may be computationally intractable, thereby we merely focus on 
approximate solutions to this optimization problem. 

III. Algorithm 

In this section we introduce a modified version of the GraSP algorithm, outlined in Algorithm 1, 
for estimation of bounded sparse signals associated with a cost function. While in this paper the main 
goal is to study the 1-bit CS problem and in particular the objective function described by (2), we state 
performance guarantees of Algorithm 1 in more general terms. As in GraSP, in each iteration first the 2s 
coordinates at which the gradient of the cost function at the iterate x™ has the largest magnitudes are 
identified. These coordinates, denoted by Z, are then merged with the support set of xW to obtain the 
set T in the second step of the iteration. Then, as expressed in line 3 of Algorithm 1 , a crude estimate b 
is computed by minimizing the cost function over vectors of length no more than r whose supports are 
subsets of T ■ Note that this minimization would be a convex program and therefore tractable, provided 
that the sufficient conditions proposed in Section IV hold. In the final step of the iteration (i.e., line 4) the 
crude estimate is pruned to its best s-term approximation to obtain the next iterate x( 4+1 ). By definition 
we have ||b|| 2 < r, thus the new iterate remains in the feasible set (i.e., ||x^ +1 ^|L < r). 

IV. Accuracy Guarantees 

In order to provide accuracy guarantees for Algorithm 1, we rely on the notion of SRH described in 
[16] with a slight modification in its definition. The original definition of SRH basically characterizes 
the cost functions that have bounded curvature over sparse canonical subspaces, possibly at locations 
arbitrarily far from the origin. However, we only require the bounded curvature condition to hold at 
locations that are within a sphere around the origin. More precisely, we redefine the SRH as follows. 

Definition 1 (Stable Restricted Hessian). Suppose that / : R n \- > R is a twice continuously differentiable 
function and let k < n be a positive integer. Furthermore, let a k (x) and /3f. (x) be in turn the largest 
and smallest real numbers such that 

Pk (x) || A|| 2 2 < A T V 2 / (x) A< a k (x) || A|| 2 2 , (4) 



Algorithm 1: GraSP with Bounded Thresholding 



s desired sparsity level 

input : r radius of the feasible set 

/ (•) the cost function 
ti — 
xW <— 
repeat 
l 



S«-burp([V/(xM)]J 

T i — supp (xW) U Z 

b i — argmin / (x) s.t. x|-p = and ||x|| 2 < r 

x (*+i) ^_ hs 

ti — t + 1 
until halting condition holds 
return xW 



holds for all A and x that obey |supp (A) U supp (x)| < k and ||x|| 2 < r. Then / is said to have an 
Stable Restricted Hessian of order k with constant \i\~ > 1 in a sphere of radius r > 0, or for brevity 
(//fe,r)-SRH, if 1 < ctk (x) /{3k ( x ) < l^k for all /c-sparse x with ||x|| 2 < r. 

Theorem 1. Let x be a vector such that ||x|| < s and ||x|| 2 < r. If the cost function f (x) have (//4 S , r ) _ 
57?// corresponding to the curvature bounds a^ s (x) ancf /?4 S (x) in (4), then iterates of Algorithm J obey 

x (*+i)_x| < (/4-/x 4s )|xW-x| +2(^ 4s + l)e, 

where e obeys ||[V/(x)] 3 |L < e/34 S (x) for all x wrY/z ||x|| < 4s and ||x|| 2 < r. 

The immediate implication of this theorem is that if the 1-bit CS loss /o (x) has (/X4 S , 1)-SRH with 
/Ms < l± ^ then we have ||xW - x *|| 2 < 2" t ||x' ,r || 2 + 2 (3 + y/3) e. 

Proof of Theorem 1 is almost identical to the proof of Theorem 1 in [16]. For brevity we will provide 
a proof sketch in Appendix A and elaborate only on the more distinct parts of the proof and borrow the 
remaining parts from [16]. 



V. Simulations 

In our simulations using synthetic data we considered signals of dimensionality n = 1000 that are s- 
sparse with s = 10, 20, or 30. The non-zero entries of the signal constitute a vector randomly drawn from 
the surface of the unit Euclidean ball in R s . The m x n measurement matrix has iid standard Gaussian 
entries with m varying between 100 and 2000 in steps of size 100. We also considered three different 
noise variances a 2 corresponding to input SNR rj = 20dB, lOdB, and OdB. Figures 1-5 illustrate the 
average performance of the considered algorithm over 200 trials versus the sampling ratio (i.e., m/ri). In 
these figures, the results of Algorithm 1 considering /o and Jmle as the objective function are demarcated 
by GraSP and GrasP-r/, respectively. Furthermore, the results corresponding to BIHT algorithm with one- 
sided i\ and £2 objective functions are indicated by BIHT and BIHT-^2, respectively. We also considered 
the £q -constrained optimization proposed by [ ] which we refer to as PV-^o- While [ ] mostly focused 
on studying the convex relaxation of this method using £i-norm, as shown in Appendix B the solution to 
PV-£o can be derived explicitly in terms of the one-bit measurements, the measurement matrix, and the 
sparsity level. We do not evaluate the convex solver proposed in [ ] because we did not have access to 
an efficient implementation of this method. Furthermore, this convex solver is expected to be inferior to 
PV-£o in terms of accuracy because it operates on a feasible set with larger mean width [8, Theorem 1.1]. 
With the exception of the non-iterative PV-£o, the other four algorithms considered in our simulations 
are iterative; they are configured to halt when they produce an estimate whose 1-bit measurements and 
the real 1-bit measurements have a Hamming distance smaller than an ^-dependent threshold. 

Figure 1 illustrates performance of the considered algorithms in terms of the angular error between 
the normalized estimate x and the true signal x* defined as AE (x) := - cos -1 (x, x*). As can be seen 
from the figure, with higher input SNR (i.e., rj) and less sparse target signals the algorithms incur larger 
angular error. While there is no significant difference in performance of GaSP, GraSP-r/, and BIHT-^ 2 
for the examined values of r\ and s, the BIHT algorithm appears to be sensitive to 77. At 77 = 20dB and 
low sampling ratios BIHT outperforms the other methods by a noticeable margin. However, for more 
noisy measurements BIHT loses its advantage and at r\ = OdB it performs even poorer than the PV-^o- 
PV-£o never outperforms the two variants of GraSP or the BIHT-^2, but the gap between their achieved 
angular error decreases as the measurements become more noisy. 

The reconstruction SNR of the estimates produced by the algorithms are compared in Figure 2. The 
reconstruction SNR conveys the same information as the angular error as it can be calculated through 
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the formula 

R-SNR(X) := -201og 10 ||x-x*|| 2 

= -10 log 10 (2-2 cos AE (x)) . 

However, it magnifies small differences between the algorithms that were difficult to trace using the 

angular error. For example, it can be seen in Figure 2 that at r\ = 20dB and s = 10, GraSP-r/ has an 

advantage (of up to 2dB) in reconstruction SNR. 

Furthermore, we evaluated performance of the algorithms in terms of identifying the correct support 

set of the target sparse signal by are comparing their achieved False Negative Rate 

FNR= |supp(x*)\supp(x)| 
|supp (x*)| 

and False Positive Rate 

|supp(x)\supp(x*)| 



FPR 



n — |supp (x*)| 

Figures 3 and 4 illustrate these rates for the studied algorithms. It can be seen in Figure 3 that at 
7] = 20dB, BIHT achieves a FNR slightly lower than that of the variants of GraSP, whereas PV-£o and 
BIHT-^2 rank first and second, respectively, in the highest FNR at a distant from the other algorithms. 
However, as rj decreases the FNR of BIHT deteriorates relative to the other algorithms while BIHT-^2 
shows improved FNR. The GraSP variants exhibit better performance overall at smaller values of 77 
especially with s = 10, but for 77 = lOdB and at low sampling ratios BIHT attains a slightly better FNR. 
The relative performance of the algorithms in terms of FPR, illustrated in Figure 4, is similar. 

We also compared the algorithms in terms of their average execution time (T) measured in seconds. 
The simulation was ran on a PC with an AMD Phenom™II X6 2.60GHz processor and 8.00GB of 
RAM. The average execution time of the algorithms, all of which are implemented in MATLAB®, is 
illustrated in 5 in log scale. It can be observed from the figure that PV-^o is the fastest algorithm which 
can be attributed to its non-iterative procedure. Furthermore, in general BIHT-^2 requires significantly 
longer time compared to the other algorithms. The BIHT, however, appears to be the fastest among the 
iterative algorithms at low sampling ratio or at large values of rj. The GraSP variants generally run at 
similar speed, while they are faster than BIHT at low values of 77 and high sampling ratios. 

VI. Conclusion 

In this paper we revisited a formulation of the 1-bit CS problem based on the maximum likelihood 
estimation. Furthermore, we applied a variant of the GraSP algorithm [16] to this problem. We showed 
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through numerical simulations that the proposed algorithms have robust performance in presence of 
noise. While at high levels of input SNR these algorithms are outperformed by a narrow margin by the 
competing algorithms, in low input SNR regime our algorithms show a solid performance at reasonable 
computational cost. 



Appendix A 
Proofs 

To prove Theorem 1 we use the following two lemmas. We omit the proofs since they can be easily 
adapted from Lemmas 1 and 2 in [16] using straightforward changes. It suffices to notice that 

1) the proof in [16] still holds if the estimation errors are measured with respect to the true sparse 
minimizer or any other feasible (i.e., s-sparse) point, rather than the statistical true parameter, and 

2) the iterates and the crude estimates will always remain in the sphere of radius r centered at the 
origin where the SRH applies. 

In what follows f Q ctk (rx + (1— r) x) dr and f Q j3k (rx + (1 — r) x) dr are denoted by 5fc (x) and 
j3k (x), respectively. We also define -% (x) := 5^ (x) — /3& (x). 



Lemma 1. Let Z be the index set defined in Algorithm I and 1Z denote the set supp (x' 4 ) 
the iterate x'*' obeys 



Then 



Jt) 



x 



< 



74s(xW)+72s(xW) 



fos (xW) 



x 



(t) 



+ 



|Vft\3/(x)|| 2 + ||V,gyK/(x) 
As (xW) 



Lemma 2. The vector b defined at line 3 of Algorithm 1 obeys 

,,-, h|| < l|Vr/(x)|| 2 74, (b) „ 

" /3 4s (b) 2/3 4s (b) 

Proof of Theorem 1: Since 2C7we have T C Z c and thus 



(* (t) -*)\A>-\\( 



r(0 



X 



T 



Ir c ii2- 
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Then it follows from Lemma 1 that 



I x lr c ll2 - 



+ 



74s X 



(:')' 



xW-x 



/3 4s (x(*)) 

||V w ^/(x)|| 2 + ||Vz\ w /(x)|| 2 



# 



< (/i 4s - 1) 



Is 



X W - x 



+ 2e, 



(5) 



where we used the fact that a 4s > a2s and /3 4s < fos to simplify the expressions. Furthermore, we have 

= lib., — x|L 



x( i+1 )-x 



x lrll2 "r - ll x lr c lb 



< ||b. 

< ||b s — b|| 2 + ||b — x|-r|| + || x| 



ini2 



^ ^ 1 1 1) 5C J *y I J q ~r ||^| 7~ c 1 1 9 1 



r c ii2 



where the last inequality holds because b s is the best s-term approximation of b. Hence, it follows from 
Lemma 2 that 



x( t+1 )-x 



< 2 



||V r /(x)|| 2 5 4 s(b), 



+ 



/3 4s (b) /3 4s (b) 

< 2e + ^ 4s ||x| r c|| 2 . 

Then applying (5) and simplifying the resulting inequality yield 

< 2e + ii±s ( (/i 4s - 1 



ir c ii2 



x (*+l) _ x 



x« - X 



+ 2e 



< (uls - m») 



x« - x 



+ 2 (fi As + 1) e, 



which is the desired result. 



Lemma 3 (Bounded Sparse Projection). For any x£l" the vector max \ 1, n r N > x s is a solution to 

L ll x s|| 2 J 

?/ie minimization 



argmin ||x — w|L s.t. ||w|| 2 < r and ||w|L < s. 



(6) 



Proof: Given an index set S C [n] we can write ||x — w| 



x-wLL + x Lc L for vectors 



w with supp (w) C S. Therefore, the solution to 

argmin ||x — w|| 2 s.t. ||w|| 2 < r and supp (w) C S 
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is simply obtained by projection of x| 5 onto the sphere of radius r, i.e., 

Ps (x) = max < 1 , 



(. Il x l5ll2 
Therefore, to find a solution to (6) it suffices to find the index set S with |<S| = s and thus the 

corresponding P5 (x) that minimize ||x — P5 (x)|| 2 . Note that we have 

||x-P lS (x)|| 2 2 = ||x| 5 -P 5 (x)|| 2 2 + ||x| 5 c|| 2 2 

l\\ I II A 2 1 II I II 2 

1 11 2 11 1 11 2 11 1 11 

|-*-H2 Il- x -l5ll2 ' H- x -l5ll2 ^ 

I II 2 9 n II I II Mill 

I x ll2 + r 2r || xl^l^ , ||x| 5 || 2 >r 

.... m9 im9 m9 o 

For all valid S with || x| 5 1| 2 < r we have ||x|| 2 — || x^ || 2 > ||x|| 2 — r . Similarly, for all valid S with 

11 m2 o iim2 o m2m m2 

x c L < r we have x, +r -2r xUL < xL — r . Furthermore, both xl — xLL and 

II 10 II Z II II Z II IO II Z — iiiiz iiii^ ll 10 II z 

IWI2 + r 2 — 2r||x| (S || 2 are decreasing functions of Hx^Hg. Therefore, ||x — P5 (x)|| 2 is a decreasing 
function of Hx^l^. Hence, ||x — P5 (x)|| 2 attains its minimum at S = supp(x s ). ■ 

Appendix B 
On non-convex formulation of [8] 

[8] derived accuracy guarantees for 

arg max (y, Ax) s.t. x 6 fC 

X 

as a solver for the 1-bit CS problem, where /C is a subset of the unit Euclidean ball. While their result 
[8, Theorem 1.1] applies to both convex and non-convex sets /C, the focus of their work has been on the 
set K that is the intersection of a centered £i-ball and the unit Euclidean ball. Our goal, however, is to 
examine the other interesting choice of /C, namely the intersection of canonical sparse subspaces and the 
unit Euclidean ball. The estimator in this case can be written as 

arg max (y, Ax) s.t. ||x|| < s and ||x|| 2 < 1. (7) 

We show that a solution to the optimization above can be obtained explicitly. 

Lemma 4. A solution to (7) is x = (A T y) s /|| (A T y)J| 

Proof: For X C [n] define 

x(X) := arg max (y, Ax) s.t. xjjc = and ||x|| 2 < 1. 
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Furthermore, choose 

X G argmax (y, Ax (X)) s.t. X C [n] and |X| < s. 

Then x(X) would be a solution to (7). Using the fact that (y, Ax) = (A T y,x), straightforward 
application of Cauchy-Schwarz inequality shows that x (X) = (A T y)L/|| (A T y)L|| for which we 
have 

<y,Ax(X)> = ||(A T y)y 2 . 

Thus, we obtain X = supp ((A T y) ) and thereby x (XJ = x, which proves the claim. ■ 
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